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ABSTRACT 

Two questions are addressed in this document: What is 
worth knowing about Pro ject Follow through? and , How should the 

National Inst itute of Education (NIE) evaluate the Follow Through 

program? Discussion of the first question focuses on findings of past 
Follow Through evaluations^ problems associated with the us 
experimental design and statistics, and prospects for discovering new 
knowledge about the program^ With respect to the second question, it 
is suggested that NIE should conduct evaluation emphasizing ah 
ethnographic, principally descriptive case^study approach to enable 
informed choice. by those involved in the program^ The discussion is 
based on the following assiiinptiohs: (1) Past evaluations of Follow 
Through have been quantitative ^ experimental approaches to deriving 
value judgments; (2) The deficiencies of quant itative , experimental 
evaluation approaches are so thorough arid irreparable as to 
disqualify their use; (3) There are probably at most a half-dozeii 
important approaches to teaching children, and these are already 
well-represented in existing Follow Through models; and (4) The 
audience for Follow Through evaluations is an audience of teachers to 
whom appeals to the need for accountalsility^^f funds or the 

rationality of science are largely irrelevant . Appended tpthe 
discussion are Chronbach's 95 theses^ proper roles , methods , 

and uses of evaluation. Theses running counter to a federal model of 
program evaluation are asterisked^ (RH) 
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What is worth knowing about Follow Through in light of 1) past studies, 

2) current technical capability of measurement and experimental analysis, 

1 — 
and 3) the impact of such knowledge once found? How should the National 

Institute of Education fulfill its new responsibility to evaluate Follow 
Through? These are questions that may lead to unanticipated and uncomfortable 
answers (unanticipated by NIE, perhaps, arid uricdmfdrtable to the technical 
experts — psychologists ^ statisticians, and the like — with whom NIE sees 
itself allied iri the manufacture of knowledge). 

Our theses before our arguments: 
!• Past evaluations of Follow Through were quantitative arid experimerital . 
They created more dissent than consensus; they chariged few minds. 
We will hot endorse, for example, what those who wrote the October 1, 
1980, NIE plaririirig dbcumerit believe was proved by the SRI/Abt evalua- 
tion^ riamely^ that "Models that emphasized basic skills produced more 
.gains iri those areas and in self concept than other models" (p. 3). 
"Models" of compensatory education are minor influences in pupils' 
development. Far more important in children's growth are their 
native endowment, their health, how their parerits arid siblirigs treat 
them, arid other influerices riot controlled by schools. 
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2. The deficiencies of quantitative^ exjDerimerital evaluation apjDrdacHes 
like those continually pressed dh the federal government (e.g., see 
the "Planning Information Study for Future Follow Through Experiments" 
produced by a team at the University of Georgia, 25 May 1979) are 
thorough and irreparable. The problem lies less with experimental 
designs for assessing causal impact (the state of this art is adequate) 
than with the imjDbssibi li ty of translating complex, subtle and vague 
notions of child development and education iritd tests fdr mass adminis- 
tration. And the notion that this problem will be resolved with matrix 
sampling, logistic item models, factor analysis and the like, merely 
betrays a shallow understanding of the real difficulties faced by 
those who wish to "quantify" education. ' 

3. There are not 22 models of compensatory education (though there are 

clearly that many arid more groups of jDrdfessidrials who cari pUt tdgether 
a "program" arid write a grarit jDrdjDdsal ) ridr are there ddzeris df "new" 
models waiting td be discdvered. There are probably at most a half- 
dozen importantly different approaches to teaching children and these 
are already well -represented in existing Follow Through models. A 
gimmick isn't a model; nor is a model an enthusiasm of a researcher 
recently emerged from the laboratory with a scientific solution to 
the problem of why some childreri learri Uricertairily or riot at all. 

4. The prbjDer audience fdr Fdllbw fhrdUgh evalUatidris is teachers igridr- 
irig fdr the moment the bdught-audience df scholars and researchers who 
write and read Follow Through reports only when they are paid to do 

so. Teachers do not heed the statistical findings of experiments 



EKLC 



3 

when deciding how best to educate children. (Nor do you, reader^ study 
experiments that tell you how best to evaluate Follow Through.) They 
decide such matters on the basis of complicated public and private 
understandings, beliefs, motives and wishes. They have the right and 
good reasons so to decide, arid neither that right ridr those reasons 
are chariged brie whit by appeals to the need for accountability for 
public funds or the rationality of science. 

WHAT IS W0RTH KNOWING ABOUT FOLLOW THROUGH? 
1. Firidirigs of Past Follow through Evaluations. 

During the 1970's, the 'US Office of Education spent about $20 
million evaluating Follow-Through. The USOE/SRi/ABT evaluation of 
Follow Through has been judged defective in many respects (Rouse, 
Glass, McLean & Walker, 1978). Follow Through models were classified 
iri misleadirig ways. Outcbme measures were adequate only for assess- 
ing the simplest mechanical skills. Attempts to measure progress toward 
other than the most narrow academic goals were unsuccessful. The evalu- 
ation was unfair to models that concentrated on goals beyond simple 
academic performance. The Follow Through evaluation proved only that 
differences in models even oh the few simple outcomes measured were 
trivially small in cbmparisbri to the large differerices ambrig sites 
fbr the same model. The tiny differerices ambrig mbdels that did exist 
skipped ardUrid perplexingly depending on how one resolved any one of 
several nuances of statistical analysis (Camilli, 1980; Bereiter and 
Kurland, 1980). ' 

House arid his colleagues (1978) drew these lessons from the costly 
experience^ 
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"The truth about Follow Through is comply No simple 

answer to the problem of educating disadvantaged students 
has been found. Even with the narrow outcome measures employed, 
models that worked well in one town worked poorly in another. 
Unique features of the local setting had more effect oh test 
scores than did the models. This does not mean that federal 
programs are useless or inappropriate for pursuing .national 
objectives; however, _many of the most significant factors 
affecting .educational achievement lie outside the cdritrdl of 
federal officials. Educational policy makers should expect 
that the same program may have quite effects in 

different settings." (P- 156) "Enough experience exists to- ^ 

suggest that these massiye_experimentswith_narrgw outcome 

measures are bad investments. The results are highly equivocal j 
and groups such as sponsors and paren 

because their goals and interests are not represented in the 
evaluation. A pluralistic society requires a variety of evalu- 
ation criteria and approaches. Groups that are significantly 
affected by an evaluat'on must have their interests reflected 
in the valuative criteria or they will jDerceive the results 
as illegitimate." (p. 158) 

"Ah even broader question is Whether the federal govern- 
ment should be advocating particular models of instructignat 
all. On the basis of previous experience and thisevaluati^ 
we thinknot. Gpvernmentadyocacy of particular 
models assumes both the feasibility of wide implementation and 
the similarity of effects in different locales. However flawed, 
this evaluation does suggest these^assumptions are contrary to 
fact. When combined with the experience of other government 
programs, the evidence is strong. that educational improvement 
strategies that acknowledge local circumstances will be far- 
more effective in the long run." {p. 158) 

In mUch of the debate about the effectiveness of Follbw through ^ the 

role of the model sponsors has been neglected. Though some sponsors have 

strongly contested the results of the Abt evaluation (Stebbins ^ , 

1977), we believe the interests of the sponsors have not been sufficiently 

clear,' Recently, however, th^ published a report in which th;^ir stake in 



the Follbw Through endeavor is aired (Hodges et^ al^. , 1980). We will riot 
review this ddcumerit in depth here^ but will discuss a few of their 
recommendations. 
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Regarding the outcome of the Follow Through experiment, the model 



sponsors observed that 

These (i.e.. Follow Through) successes are impressive, but they are 
riot sufficient. Promising approaches to teaching disadvaritaged childreri 
have been demonstrated^ but still tod- little is Rridwn about how to make 
schooling effective arid pleasaritfor large ri umbers of children, _ Mariy 
childreri are still performing well below their poteritial. The Ibrig 
term effects of intervention in the primary grades are riot known. 
The effectiveness of the cdmpohents of several instfuctidhal approaches 
was not revealed. Many questions remain about how to insure that good J 
instructional practices become widespread. The_ impediments to impl e- 
mentation of systematic educational approaches are still present and 
not yet fully understood, (p. 73} 



The model sponsors stated further that 

The implementatiori problems must be solved since it is apparent that 
many people are unhappy with the way the schools presently serve 
ecooomically disadvaritaged childreri. Those who participated in _ 
Follow Through as model -sjDonsdrs believe that charige cannot come _from 
within the local-schddls dn any majdr scale and that inceritives fdr 
change more powerful than those presently available must be provided. 
Tne paradox is apparent - the literature on change recognizes that 
change must be desired by those within the_ system, but experie 
reveals that those within the system who require 
more support and knowledge than can possibly come from within. 
An immediate solution to this parsdox is' not apparent. Model -sponsor- 
ship is only one possible avenue, (p. 74) 

The model sponsors stress the need to understand complex relationships 
withiri a cdmmuriity as well as betweeri a eoinnuriity arid ari educatidrial program, 
thus the spdhsdrs have recdmmehded that 1) "Ihfdrmatidh dn the status df a 
school system prior to the initiation of an intervention is needed," and 
2) "The data demonstrate that communities differ radically. More informa- 
tion IS needed ori how to identify* arid index these differences to determine 
how they affect the implementation of a model." We also want to note the 
spbrisdrs have recdrmierided that 3) "State edueatidri agencies should be 
ihvdlved in edUcatidnal changes in the schbdls at a more meaningful level 
than that mandated by current Follow Through regulations," and 4) "Federal 
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gbverrimerit decisibri making cdncerhihg large-scale programs like Follow 
Through must become more timely and better coordinated. f 

We agree that with the suggestion that more attention to the particular 
conditions within a community is likely to result in a richer description 
of an educational program and thus lead to a more useful kind of knowledge. 
But we sense that there is more to the sponsors* jDosition as stated. 
There seems to be the implicatioR that the variety occurring within communis 
ties can be characterized by an underlying model ^ rather than accumulated 
experience. Assuming that educational settings conform to this model in 
a systematic fashion, the problijm is to establish enough of the theory to 
improve educational practice. Moreover, we hear a tacit approval of large- 
scale educational evaluations with the corresponding "assistance" of 
federal and state agencies. "Stronger incentives" to change from the 
outside can also become a means of unwanted intervention, or worse, 

i 

imposition. 
2. Experimental Design & Statistics. 

Our low estimation of tho importance of quantitative experimental 
methods in the evaluation of Follow Through is not a refleetibri of a 
more general attitude of suspicibri abbut the validity bf these methods. 
With the exception bf the concern lavished oh statistical inference* by 
USOE/SRI/ABT evaluators, the methods of design and analysis used in the 
past were generally adequate to the purpose; the purpose is now outdated 
and inappropriate. 

*inferential statistical concerns have been over-emphasized inpast FT 
evaluations. Alpha levels make sense either when based on explicit 
probabilistic sampling (as in well-done surveys) or when based bh ran- 
domization inexperiments (thus providing a permutatibh interpretation 
bf alpha levels). Without either (the situatibh in Fdlldw Through 
evaluation) i statistical inference means little; it merely gives a 
fsise sense bf confidence in the findings and drawsattention from_the_ 
more complex questions of generalization that statistics will not solve; 
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There is nothing inherently inadequate in quasi-experimental metho- 
dblbgy; indeed, when handled well it is an impressive col lection of tools 
(Cook arid Campbell i 1979). Much criticism has been leveled agairist the 
analysis of covariarice (ANCOVA) i arid yet mariy statistical studies of 
bias have shown covariance adjustments to be among the best (Rubin, 1973; 
Cochran and Rubin 1973). Different methods of correcting for fallible 
covariates provide plausible bounds for a true estimate of treatment 



effects. Nevertheless, we will rehash briefly the issue of fallible 
covariates; we will consider the case of a single cdvariate (with multiple 
covariates i a single "best" liriear cbmbiriatiori cari be formed). 




'As illustrated iri Figure 1* bias can be characterized by a discontinuity 
in a linear function relating the ideal to the actual covariate. "Reliability" 
corrections may increase the slope of the wi thiri-grdUps regressions and thus 
reduce the discontinuity. An underestimate of reliability* though* increases 
the slopes too much resulting in an ox'ercorrection. One is left with the 
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problem of determiriirig the apprdpriate type of reliability. Internal con- 
sistency* stability or generalizablllty coefficients do not insure the 
relevance (Cronbach, 1977) of the covariate. One may do better to 

think of the covariate as a linear model of the rales or process by which 
individuals were assigned to groups (e.g., Fbllbw Through and hdh-Fdllow 
Through). The more accurately the ideal assignment rule is approxi- 
mated, the more accurate the cdvariance adaustment (Cronbach, §t 1977): 
Discriminant analysis can be used to assess assignment rules in terms of 
errors of misclassiflcatlon. When this analysis was applied to the data 
from the Abt Follow Through evaluation, it was found that pupils could be 
assigned to their appropriate Follow Through or nbri-Fbllow Thrdugh (cdhtrdl) 
group with an error rate of only 20% (using the dbierved covariates in the 
discriminant function tb make the classification) (Camilli, 1980). Con- 
sidering the guidelines fdr selecting Follow Through pupils, this error 
rate is suprisingly lew. Thus, there is evitience that covariance adjust- 
nents of the kind applied in past Follow Through evaluations may be adequate 
But for a decent quantitative experimental evaluation tb be possible^ ade- 
quate measures of program success are needed. Here* past effdrts brdke 
down. First some technical problems * then the tdugh problems. 

In the Abt Fdl low through evaluation, the logistics of mass testing 

i 

dverwhelmed the effort. The actual testing was not standardized across 
sites' and models; moreover, testing itself was valued differently in 
different places and by different persons. The result was some interesting 
features in the data that heretofore have nbt been sufficiently hdticed ndr 
commented upon. For example, fbr ridri-Fdllow Thrdugh pupils, the percentage 



of the pupils scoring below chance on some Metropolitan Achievement Test 
(MAT) subtests var-'ed from 5% to 35% across the models. Also, in Follow 
Through groups, relatively large proportions of children scored z ero on 
multiple-choice subtesti; of 35 items, four options per item. Even mere 
interesting, perhaps, is the relation of percentage of hbn-Follbw Through 
model gains. In Figure 2, although only 14 models are plotted^ a strong 
positive relationship is observed for MAT reading. Furthermore, when 
percent of Follow Through pupils below chance is partialed out, the 
strength of the relationship does not decrease (Camilli, 1980). Thus^ 
the invalidity of the testing procedure is evident in the test scores 
themselves. 

In going so far to cbmmerit bri these technical issues ^ we risk creating 
the irnpressidri that we believe that the key questions about Follow Through 
evaluation are technical or amenable to technical solutions. We do not. 
The important problems with Follow Through e/aluation are not technical, 
""hey will not be solved with Rasch models or factor analysis or "principal- 
axis adjustments" (which Wisler, Burns and Iwamoto, 1978, believe would be 
a useful addition to future Fbllbw Through eval uatibns) . The problems will 
hbt be sblved by eliminating test scbres below chance levels, testing 
itself is hot valued equally in different Follow Through models. Some see 
It as an obnoxious intrusion; ethers drill pupils for weeks on item- forms. 
Test 'results must not be the sole or even primary indicators of success. 
To invest large amounts of money in attempts to synthesize test results 
for "education managers" is indefensible. Mass testing is a grbwing 
federal tendency. It exists primarily as ah attempt to simplify 
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Figure 2i. Plot of treatnient effects against percentage of NFT n 
scores below chance. 
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intrinsically complex issues. 

The art of teaching must not be subordinated to the technology of mass 
testing. These tests homogenize educational goals and aspirations — an 
ironic fate for a program founded on the slogan '^planned variation." 
. Discovering New Knowledge About Fol low- Through. 

"New" Knowledge . There is much rhetoric in NIE prosjDectuses for Follow 
Through evaluation about "new models" of early compensatory education. We 
see nothing on the horizon that would justify this optimistic expectation. 
We suspect that the rhetoric derives either from vain hopes or the need of 
NIE to justify its 20% of Follow Through funds in terms of its mission, 
viz*^ research^ riot service. 

To anyone but a gul 1 ible, prbfessidrial educator • caught up iri erithusiasm 
for one gimmict or another * the idea that there are even 22 "models" of 
—earTy-compensatory education is ludicrous. Indeed, these 22 exhaust (with 
much redundancy) the pedagogic imagination: Testalozzi via Montessori; 
Freud via Bank Street; Watson via Skinner; not to mention Jesus Christ 
via Thomas Aquinas (since Follow Through is a publ li project). Does 
anyone really believe that there is something all that remarkable hiding 
in the bushes that will be discovered by federal evaluation programs? 
NIE planning documents refer to "media," and "hone learning^" arid the 
new problems attendant on the rise of "single-parent families." This all 
sounds quite up-to-date; but none of it rings true. We do not doubt in 
the least that a dozen educators can be found who can generate excitement 
about the prospects of some psyehb-linguistie trick or the promise of 
"cognitive psycholoey." But we doubt seriously that an educator can be 
found who is capable of improving on the variegated state of the art 
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already represented by the two ddzen extant Follow Through models; We don't 
need more Follow Through models i we need more thinking and less measuring 
dri the models that already exist. A brilliant young psychoanalyst and 
Marxist in Berlin in the 192d's, Siegfried Bernfeld, commented as follows 
Oh the historical fads that swept the educational wOrld Of early twentieth 
century Europe: 

"All of the educational means thought to be appropriate for changing ... 
the child's naive and intuitive personality into sorre higher form are 
suspiciously simple and trite. Jointly and individually, they are not 
new; thus is their banality revealed. Nor is it likely possible to devise 
a genuinely new method of education. Certainly the great pedagogues have 
not succeeded. It matters little whether today they recormiend the power 
of love or strict discipline, whether they recommend teaching by word or' 
by example or by the rod, whether they demand the teacher*s active involve- 
r.int or his patient attendance, Whether they insist on the rechanneled 
acting out Of the child's impulses or their repression; Ever- since there 
Were parents and teachers, the ancient gamut from a mere stern glance to 
prison punishment have all been triad; Ehildren came to be the multitude 
by motley combinations of these methods, and mOre multitudes were raised By 
this multitude; there can be no combihatiOhs that have not already been 
tried — and the result is the mankind of today, of any day;:;the banal 
and c.ommOriplace methods possess none of the power to transform and perfect. 
Which the great pedagogues ascribe to them; There is no magic* neither in 
the teacher's gentle rebuke nor in the salutary thrashing." (Bernfeld, 1928, 
p. 38; translation by the first author.) 
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knowledge and Human Interes ts , Many planners of evalaatidns are 
curiously inconsistent in the way they treat potential audiences for 
their findings and the way they treat themselfes. Those who argue 
vigorously for the superiority of quantitative experimental evaluation 
(like that which characterized title I and Follow Through evaluation in 
USOE) fail to realize that their own favorable evaluation of this method 
is not itself based on evidence from quantitative experimental evaluations 
They cannot point to experi men ts^ that show quantitative experimental 
evaluations to be superior to other kinds (e.g., to ethnographie/case- 
study evaluations, evaluations by pers(3hal experiericei evaluations of 
other than cost-benefit issued) i rather th^ judge the superiority of 
their chosen methods on the basis of personal experience, loosely ana- 
logous experience, historical analyses and other non-quantitative, 
non-experimental ways of knowing and evaluating, it would be inappro- 
priate to pursue ah investigation of personal motives too far, but it 
seems safe to say that some people's preference for quantitative experi- 
mental evaluation is little more than an expression of personal taste; 
they fear the future, they fear ambiguity, and they like a neat desk, 
tidy plans and simple answers. And yet, when it comes to the question 
of other persons' right to decide how to teach children, these same 
evaluators deny others the right to decide on grounds epistemblogically 
equivalent to those on which the evaltiatbrs choose their evaluation 
methods. If it were riot that so mariy people were intimidated by the 
evaluators' methods, their arbitrary authority would more quickly be 
seen as illegitimate. 

Consider the following situation. You have before you two pieces 
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of information about a Follow Through model: a) an experimental study 
showing that this particular model produces' average achievement .10 
standard deviations higher than conventional teaching; b) a complete 
descriptive study of the mbdeVs goals, aspirations and procedures, the 
feelings and reactions of teachers and parents who have worked with the 
model for five years, a critique of the model, by educational philosophers 
and methdddlogists who have seen the model work, and other pertinent 
observations of a resident observer of the model which we cannot list 
since they cannot be anticipated in detail. Which of the two reports will 
you accord greater weight? Before answering, bear th^s in mind. The 
ehurch of Scientology and the Maharishi International Unive -sity (Fairfield, 
Iowa, in case you think we made it up) can cite experimental studies vali- 
dating their approach that show effect sizes bigger than the ;10 effects 
typical of the impact of the Follow Through models on achievement measures 
(Ferguson, 1980). Yet, are you in personal danger of erirollirig soon in 
either Scientology or MIU? We doubt it. "Well*" you say, ''the experi- 
mental effects of Transcendental Meditation are only a small part of what 
is entailed in a decision to matriculate at MIU." "Indeed, even as many 
people feel that Metropolitan Achievement Test scores are only a small 
part of judging Follow Through models," we respond; When we confront 
a significant choice (career, marriage, family, political party, frierid- 
shi,os and the like), we worry about what will be expected of us as persons 
and how that accords with bur feelings about our integrity, our contri- 
bution to our loved ones and friends, our happiness and our moral obliga- 
tions. We worry little about .10 effects on measurable variables; 

The audience for Follow Through evaluation is those professional 
educators in schools who worry about teaching young poor children. 
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They are not interested in evaluatbrs' exjDerimerits nor education managers 
edits. They decide how to teach oh the basis of enormously complex and 
partly private attractions and antipathies. And, dear reader, if you 
feel inclined to scorn their unscientific and irrational minds, reflect 
again on the fact that you do not want your child to enroll at Maharishi 
International University or that the Scarsdale diet appeals to you far 
more than hypnosis as the way to lose 20 pounds because the idea of 
hypnosis strikes you as weird and unsettling. 

The audience for Follow through evaluation wants to know much more 
than experiments and measurements can tell. They want to know what is 
expected of teachers who use this Follow Through model. Is It consis- 
tent with their view of themselves as professionals, as saviors of poor 
children, as "instructional managers"? Does this model treat pupils as 
though they were robots ^ or delicate flowers * or children of God? 
If they* the teachers* adopt the prescribed role will they grow to be 
. like Jean Plaget, or Maria Montessori, or Anna Freud, or Siggy Engelmann? 
Are teachers treated as intelligent human beings or merely as means toward 
technically prescribed ends and instruments of someone else's vn'll? 
What do people really want to know about Direct Instruction, to pick an 
example? They don't care whether DI can coach pupils to spell more words 
correctly than can Bank Street. They want to know If thera is any sub- 
stance to the rumors that DI is psychological torture for the children 
who go through it or if DI teachers grow to feel demeaned and superfluous. 
One does not answer these questions adequately by asking Becker and 
Engleman nor by administering the Matrdpolitan Achievement Test. Likewise^ 
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people want to know what kind of personalities their children would be 
exposed to if they were enrolled in a Bank Street program, with its 
mildly unsettling armona of Freudianism. Such are peoples' concerns. 
They can not be discounted or ignored on grounds of pedagogic efficiency^ 
cost-effectiveness, democratic decision-making nor the rational conduct 
of public affairs i each of which is a value honored by those who presume 
to evaluate Fdlldw through. 

Worth wfri^le Knowledge About Education, In complex evolutionary 
systems like education, it is generally more important to evaluate an 
image of the future than to evaluate current accomplishments (Boulding, 
1978). The key perception of value may be nearer to the recbgriitibn 
of potential than the corifirmatioh of current productivity. Educational 
technologists are fond of pd in ting dut cdrrectly that the steam engine 
Isot its first race with a horse. How in 1970, should one have judged 
the value of stock in Post Slide-rule Company vs. Texas instrument? 
It is the dependence of valuations of educational enterprises on their 
images of the future and the low predictability of these futures that 
make educational evaluatibri such a risky business (Glass * 1979). 

. H0W SHQUte NIE EVAbUATE FGbtBW TRR0UGH? 

House (1980) criticized the model of evaluation that grew up during 

the 1970*s in USOE and ndw threatens td infect NIE's effdrts to evaluate 

FoViow Through: 

"Federal evaluation policy has been based on the systems 
analysis approach, its major audiences are managers and econo- 
mists. It assumes consensus oh goalsion known cause and effect, 
and on a few quantified outcome variables. Its methodology includes 
planned variation experimentation and cost-benefit analysis. Its 
end is efficiency. It asks the question. What are the most efficient 
programs? 
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"As Articulated by major prdporierits litie Rivlin and Evans, it_ 
assumes there is a direct parallel between the jDroductibri bf social 
services and manufacturing. The same analysis techniques will apply. 
The only true Rnbwledge is a prbductibri fiirictiori specifying stable 
relatibriships between IrijDut arid butjsut. The brily way to such know- 
ledge is through experimerital .methods arid statistical techniques. 
Itis pbssible tb agree bri a few output measures. The issue is 
efficierit alldcatiori of resources. 

"The key decisions will be made at higher government levels, 
and tough management cari do the job^ The ultimate j^^^^ 
is utilitarian -- to maximi^ze satisfaction in society. To maximize, 
one must know which programs are most efficient. This can be done 
only by cdmparirig alternatives, for which brie must have a cbmmbri 
measure of output. This is a job for experts. 

"There are places where this ajDprbach cari be applied success- 
fully. But the Uriited States as a whole is riot one bf them. The 
approach cari be successfully applied where there reallyare orily 
a few goals arid outcome me^.sures. This is likely to happen where 
the audience for the evaluation is very narrowly defined and agrees 
on a_ few criteria of comparison. It also helps if the criteria 
cari be represented by a reasonably valid quantitative indicator." 
(House, 1980, p. 222) 

The USOE evaluation of Follow Through took ten years arid cost $20 
milliori; it was riot worth the money. Arid thbse whb were jDrimarily respon- 
sible fbr its fbrm (wherever they are tbday)' remain doggedly unrepentarit. 

"the identificatidn of successful sites, combined with_ the 

often weak or variable model effects, suggests that local conditions, 
such_as chi Idren whose riieds match espe 

can provide, local variaMoris_of the nradel, or especially skilled 
teachers^ were more apt to determine success than the models used. 
We do not think this mearis that work on educational models like those 
implemented in Follow Through should be abandoned. A few models had 
results consistent enough to warrant coritiriuing development arid test- 
ing of these arid other models^ It is pbssible that mbre iibdels might 
have shbwri pbsitive results Ifthey had beeri mbre precisely specified 
at the outset arid more faithfully arid uniformly implemented in the 
f»chddl setting. 

"For this arid other reaspnSjWethink that Follow 

not been a fair test of whether or not we cari learn from a large- 
scale educational experiment. ^ Launched hastily because of an 
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unexpected turn in congressional apprdpriat^^ Fdlldw Through 

experiment never really righted itself. Ndnetheless^ because df 
the accountability movement in educatidn, the potential for running 
sound experiments may be even better today than it was in 1968, " 
(Wisler, Burns & Iwamoto, 1978, p. 180) 

. we disagree that there should be less government control 
of evaluations generally. The early difficulties with the Follow 
Through evaluation can be traced to lack of sound and strong direction 
from 0E» not to interference from. the government. The evaluation was 
salvaged_bnly_after Garry MeDariiels and others assumed cdritrbl of 
it in 1971. Our experience with other federal education evaluations 
suggests that weak direction from OE is a sure guarantee df a useless — 
evaluatidn. We alsd disagree with the cdnclilsidn that the type df 
evaluatidn used fdr Fdlldw Thrdugh is no longer needed." . (Wisler, 
Burns S Iwamoto, 1978, p. 179) 

The cour:>e on which NIE evaluation of Follow Through is set (or will 
soon be set) threatens to honor.: unwittingly perhaps, the values of "science" 
as they are viewed by logical positivists (particularly behavioral psycho- 
gists ^ who almost alone among observers are jDleased with the results of 
past attempts td evaluate Fdlldw Thrdugh). These values are described by 
their friends as "objectivity through operationism" and by their enemies 
as "Fliegenbeinenzahlen" (literally "counting flies legs" or figuratively, 
uS trivializing overquantification). in a democracy, there is a great 
range of values that must be honored by those who presume to evaluate in 
the public interest; arid those values go far beybrid what is ribw measureable 
I am reffering'td such things as dignity^ respect arid Idve. Arid the thdught 



that these are merely multivariate outcome variables that will yield their 
secrets to the scientific coaxing of factor analysis is a thought hopelessly 
held prisoner by the shackles of logical positivism. 

This is a disheartening future for Follow Through evaluation if it is 
indeed the future that NIE is iri dariger bf bririgirig abbut. But it is pre- 



cisely what is td be expected because NIE is gdirig back td the same experts 
who gave USOE the same old advice about measurement, design and analysis 
(only now the advice is proped up with false hope and excuses for past 
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failures). 

The convergence of past Fonow through evaluations on the common, easily . 
measurable outcomes is having the unwholesome effect .of homogenizing the ' 
evolution of programs. In education as elsewhere, the old adage holds true: 
enemies grow to resemble each other. Where organizations and people fight 
in a zero-sum battle for the same resources, in time they grow increasingly 
alike. Thus the damage wrought by evaluation on a few criteria that are. j 
currently p>"epared for mass testing is doubly serious. It is not only 
unfair to contemporary efforts whose benefits are poorly understood, but 
it warps the evolution of efforts that might otherwise have made unanti- 
cipated accomplishments. 

NIE is too dangerously close to believing the history that tJSOE writes 

about its evaluation experiences. Already the NIE plans for Follow Through 

evaluation smack of the USOE model. Quoting from the October 1, 1980, 

"Plans for Follow Through Research and Development": 

"As one cohort of approaches is ful ly tested, i t wil 1 be 

phased out of funding, results will be disseminated, and another 
cohort of approaches vvi 11 be phased in. Through this strategy, 
it is planned to continually infuse the Follow Through Program 
with new research-based knowledge to improve its effectiveness." 
(p. 7) 

"... -NIE will test a small number of approaches to school 
imprbvernent in the management and implementatiqfl area and ddcumeht 
their effectiveness with sufficient detail so that the results are 
replicable for widespread dissemination in Follow Through and 
elsewhere." (p. 12) 

The conception of evaluation that seems to underlie the NIE planning 
document does not accord with the reality of how schools change or how 
educators create and grow. And worse yet, this reality is increasingly 
difficult to discern because the federal goverrimerit is changing schools 
to accord with its own image of how knowledge should be produced. 
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disseminated arid used. By coritrdllihg Dissemination Panels and the "vali- 
datirig" of jDrbgrams arid money to induce schools to join a system of know- 
ledge production and use, the federal government risks changing schooling 
into the image of its own conceptualization arid risks the loss of value 
in a broader and truer sense. Plaririirig arid control terid to create self- 
confirmirig futures arid destroy alternative futures; alternatives (variations) 
are essential to growth arid charige. 

How should NIE evaluate Follbw-fhrdugh? Like it has never before 
beeri evaluated. 

1. NIE should dispense wifh the fiction that the purpose of 
Follow Through evaluatiori is to validate and invalidate 
models. Irideed, it should admit that the' eoritiriued exis- 
tence of approaches to teachirig poor children does not 
deperid on goverrinTerit-sponsbred field experiments. 

2. NIE should disabuse itself of the myth that "new models" are 
likely to be "discovered" by any methods, particularly by 
the methods of quarititative experimerital desigri. 

3. iristead of imitating past efforts, the NIE should cbriduct 
evaluation that emphasizes description (principally quali- 
tative) for irifbrmed chbice. Mbdels should be described in 
terms that people consider persdnally significant when they 

' choose a particular profession for themselves or a school 
for their children. (By coritrast, the lariguage of current 
NIE plaririirig documents is technocratic, behavibristic arid 
ariti-democratic. ) An ethriographic br case-study approach 
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to evalUatidh should be adopted in place of a quantitative * experi- 
mental field trial. What one needs to know about Follow Through 
models is not more statistics (these exist in abundance) but rather 

a) coherent, detailed portrayals of life in school i^or 
pupils, teachers and parents as it is colored and 
shaped by allegiance to a particular Follow Through 
model i 

b) such portrayals having been written by disiriterested* 
expert ethnographers with at least two years on-site 
for data collection and, 

c) such portrayals being focused on a broad range of con- 
cerns including the mbdel's philosophy, its history 
(since its future must be projected), techniques, 
financial and psychic-costs i side-effects arid after- 
effects, the roles it requires oeople to play, its 
potential for a favorable evolution, and the like. 
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SUMMAR? 

Past evaluations of Follow Through were quantitative and experimental. 
They created mm dissent and changed few minds. "Models" of compensatory 
education are minor influences in pupils' development. More important in 
children's growth are their native endowment, their health, how their 
parents and siblings treat them, and other influences not controlled by 
schools. 

The deficiencies of quantitative, experimental evaluation are thorough 
and irreparable. The problem lies less with experimental designs for 
assessing causal impact than with the impossibility of translating complex, 
subtle and vague ndtidns of child development and education into tests for 
mass administration. 

There are probably at most a half-dozen genuinely arid imjDortaritly 
different approaches to teaching children arid these are already well- 
rep resented iri existirig Follow-Through models. 

The audience for Follow-Through evaluations is an audiefice of teachers. 
This audience does not need the statistical findings of experi merits when 
deciding how best to educate children. They decide such matters on the 
basis of complicated public arid private uriderstandirieis ^ beliefs ^ motives 
and wishes. They have the right and good reasons so to decide. 

The course on which NIE evaluation of Follow Through is set threatens 
to honor,^ unwittingly perhaps, the values of "science" as they are viewed 
by logical positivists, mainly behavioral psychologists. There is a greater 
range of values that in a democracy must be honored by those who presume 
to evaluate in the public interest; arid those values go far beybrid what is 
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NIE should dispense with the fiction that the purpose of Follow- 
Through evaluation is to validate and invalidate models. Indeed, it 
should admit that the continued existence of approaches to teaching poor 
children does not depend oh government-sponsdred field experiments. 

NIE should disabuse itself of the nyth that "new models" are lifcely 
to be "discovered" by any methods ^ particularly by the methods of quanti- 
tative experimental design. 

In place of past efforts, the NIE should conduct evaluation that 
emphasizes description (principally qualitative) for informed choice, 
flodels should be described in terms that people consider personally 
significant when th^ choose a particular profession for themselves 
or a school for their children. (By contrast, the language of current 
NIE planning documents is technocratic and ariti-derndcratic. ) An ethribgra 
phic or case-study approach to evaluation should be adopted in place of a 
quantitative, experimental field trial. What one needs now is not more 
statistics but rather 

a) coherent, detailed portrayals of life in scheol for pupils, 
teachers arid parerits as it is colored arid shaped by allegiariee 
to a particular Follow Through models 

b) written by disinterested, expert enthnographers with at least 
two years on-site for data collection and, 

c) focused dri a brdad range df concerns including the model's 

, philosophy, its history (sirice its future must be projected), 
techniques i firiaricial arid psychic-cbsts^ side-effects and 
after-effects i the roles it requires people to play^ its 
potential for a favorable evolution, and the like. 
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APPENDIX 

In their new book Toward Reform of Program Evaluation , Cronbach and his 
associates (1980) listed 95 theses about the proper roles, methods and uses 
of evaluation. Although they invited readers to discuss the theses and 
sharpen their thinking on zhm^ there is ho mistaking the fact that these 
declarative statements represent the results of the group's best thinking 
about evaluation. And that thinking is remarkably broad and perspicacious. 
Moreover, the theses provide an excellent background against which to critique 
the thinking oh evaluation that characterized US0E efforts in Title I and 
Follow Through evaluation in recent years. 

In the list that follows ^ vve have marked with an asterisk each assertion 
that clearly runs counter to the federal model of program evaluation that 
came to characterize USOE and threatens to influence NIE. 



Ninety-Five Theses 

I . Program evaluation is a process by which society learns 
about itself. 

X 2. Program evaluadons should contribute to enlightened dis- 
cussion of alternative plans for social action. 
3. livaluation is a handmaiden to gradualism; it is both cdhsisf- 
vativa and cdmrnitted to change. 
X ^. Ah evahjatipri of a particular program is only ah episode in 
the cdhtinUirig evblUtibh of thought about a prbblern area. 
5. The better and the more widely the workings of social pro- 
grams are understood, the more rapidly policy will evolve 
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and the more the programs will contribute to a better qual- 
ity of life. 

6. Observations oF social programs require a closer analysis 
than a lay interpreter can make, for unassisted judgment 
leads ail too easily to false interpretations. 

7. in debates over controversial programs, Hars figure and fig- 
ures often lie; the evaluator has a responsibility to protect 
his clients from both types of deception. 

* * * 

8. Ideally, every evaluation will m form the social system and 
improve its operations, but everyone agrees that evaluation 
is not rendering the service it should: 

9. Commissibners of evaluations complain that the messages 
from evaluations are not useful, wliile evaluators complain 
that the messages are not used. 

* * * 

10. The evaluator has pdUtia-^l influence even when he does not 
aspire to it. 

X 1 1 . A theory of evaluation must be as much a tliedry of political 
J^teractidn as it is a theory of how to determine fj 

X 12. The hope that an evaluation will provide unequivocal 
answers, convincing enough, to extinguish controversy 
about iJic merits of a social program, is certain to be dis- 
appointed. 

X 13. The evaluators* prdfessibhal conclusions cannot substitute 
for the political process. 
14. The distinction between evaluation and poUcy research is 
disappearing. 



* * ♦ 

X 15. Accountabihty emphasizes looking back in order to assign 
praise dr blame; evaluation is better used to understand 
_ events and processes for the sake of guiding future activities. 
1 6. Social renovations disappoint even their architects. 
1 7. Time and again, political passidh has been a driving spirit be- 
liind a call for rational analysis. 
^ X 18; A demand fdr accburitability is a sign of pathology in the 
political system. 

* * * 

X 19. An open society becomes a cldsed society when only the 
ofncials know what is gding oh. Insofar as information is 
a source of power, evaluations carried out to inform a policy 
maker have a disenfranchising effect. 

X 20. The ideal of efficiency in government is in tension with the 
ideal of democratic participation; rationalism is dangerously 
close to totalitarianism; 

o 
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X 2 i ; The notion of the evaloator as a superman who will make aH 
social choices easy and ail programs erilcient, turning public 
ihanagemeilt into a tcchndidgy, is a pipe dfeam. 

X 22. A context of command, with a manager in firm control, has 
been assumed in nearly all previous. theories of evaluatibri. 

23. An image of pluralistic accommodation more truly repre- 
sents how policy and programs are shaped than does the Pla- 
tonic image of concentrated power and responsibility. 

24. The evaluatbr must learn to serve in contexts of accomoda- 
tion and not dream idly of serving a Platonic guardian. 

X 25. In a context of accommodation, the evaluator cahhbt ex- 
pect a "go/no-go" decision to turn on liis assessment of 
outcomes. 

« 4> « 

X 26. What is needed is information that supports hegbtiatibh 
rather than information calculated to point out the "cor- 
rect" decision. 



27. Events move forward by piece meul adaptations. • 
X 28. It can scarcely be said that decisions about typical programs 

are "made"; rather, they emerge. 
X 29. The pblicy-shapirig cbmmunity dbes ribt wait for a sure 
winner; it must act in the face of uncertainty, settling on 
plausible actions that are potiticaiiy acceptable. 

« lit 4i 

30. It is iinvvise for evaluation to fdcUs on whether a project has 
"attained itsgbals." 
X 31. Goals are a necessary part of political rhetoric, but all so- 
cial programs, even supposedly targeted ones, have broad 
aims. 

32. Legislators who have sopliisticated reasons for keeping goal 
statements lofty and nebulous unblusliingly ask program 
administrators to state explicit goals. 
X 33. Unfortunately, whatever the evaluator decides to measure 
tends to become a primary goal of program operators. 

m m m 
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34. Evaliiators are riot encouraged to ask the rnost trenchant 
questions about entrenched programs; 

35. *'Evaluate this program" is often a vague charge because 
a program or a system frequently has no ciear boundaries. 

X 36. Before the evalUatdf can plait data cdlicctidh, lie must find 
out a great deal about the project as it exists and as it is con* 
ceived. 

X 37. A good evaluative question invites a differentiated answer 
instead of leaving the program plan, the delivery of the pro- 
gram, and the response of clients as unexamined elements 
within a closed black box. 

X 38. Strictly honest data cbllectibri can generate a misleading pic- 
ture unless questions are framed to expose both the facts 
useful to partisans of the program and the facts useful to its 
critics. 

* « « 

X 39. Before laying out a design, the evaluator should do consider- 
able homework. Pertinent questions should be identified by 

examining the history of similar programs, the related 
social theory, and the expectations of program advocates, 
critics, and prospective cHerits, 
40. Precise assessment of outcomes is sensible only after thor- 
gUgll pilot work has pinned down a highly appropriate form 
for an inribvatibri under test. 
X 41. When a prototype program is evaluated, the full range of 
realizattons likely to occur in practice should be bbserved. 

42. Flexibility and diversity are preferable to the rigidity writ- 
ten into many evaluation contracts. 

m ^ ^ 

43. The evaluatbr whb dbes riot press for productive assignments 
and tile freedom to carry therri out takes the King's sliilling 
for selfish reasons. 

44. The evaluator's aspiration to benefit the larger community 
has to be recdnciled—sometimes painfully— with commit- 
riients tb a spbrisbr arid tb infonHahts, with the evaluatdf's 
political convictions, arid with his desire tb slay in business. 

45. Managers have many reasons for wishing to maintain control 
over evaluative information; the evaluator can respect all 
siich reasdhs that fall within the sphere of management. 

46. The crUcial etliical problem appears td be freeddm td cdnl- 
numicate during and after the study, subject tb Icgitiriiate 
concerns for privacy, natiorial security, and faithfulness tb 
contractual commitments. 

47. With some hesitation, we advise the evaluator to release find- 
ings piecemeal and informally td the audience.s that need 
theiri. The iriipbtence that cbmes withdelay may be a greater 
risk than the possibility that early returns will be misread. 

m m m 



48. Nbtluiig makes a larger difference in the Use of evajuations 
than tlie personal factor-the interest of officials in learning 
from the evaluation and the desire of the evaluatbr to get 
attention for what he knows; 

49. Cpnimunicatidn dverload is a common faalt ; many an evalu- 
ation is reported with self-defeating thorouglmess. 



X 50. Much of the most significant communication of findings is 
informal, and hot all of it is deliberate; some of the most 
significant effects are indirect, affecting audiences far re- 
moved from the program under investigation. 

51. An evaluation of a particular project has its greatest impBca- 
tions for projects that will be put in place in the future. 

52. A program evaluation that gets attention is likely to affect 
the prevailing view of social purposes, whether of not it im- 
mediately affects the fafj of the programstudied. 

X 53; Advice on evaluation typically speaks of ah investigation as 
a stand-alone study that wiii draw its conclusions about 
a program in complete isolation from other sources of in- 
formation. 

X 54. It is better for iin evaluative inquiry to launch a small fiect 
of studies than to put all its resources into a single approach. 

X 55; Much that is written oh evaluation recommends some one 
"scientifically rigorous" plan. iEvaluations should, however, 
take many forms, and less rigorous approaches have value in 
many cifcunistances. 

X 56. Results of a prdgfam evaluation are so dependent on the set- 
ting that replication is only a figure of speech; the evaluatbr 
is essentially an historian. 

* * * 

X 57. An elegant study provides dangerously convincing evidence 
when it seems to answer a question that it did riot in fact 
sqiiarely address. 

X 58. Merit lies riot in fonh of iriquiry but in relevance of infor- 
mation. The context of command or accommodation, the 
stage of program maturity, and the closeness of the evalu- 
ator to the probable users should ail affect the style of ah 
evaluation. 

X 59. The evaluatbr will be wise nbt to declare allegiarice to either 
a quarititative-scieritific-siirririiative methodology or a quati- 
tative-naturaHstic-descriptive methodology. 
6G. External validity— that is, the validity of inferences titat go 
beyond the data-is the crux; increasing internal validity by 
elegant design often reduces relevance. 
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m * * 

X61.AUdirig a cbhtrol costs something in dollars, iil atten- 
tion, and perhaps in quality of data; a cbhtrol that fbrti- 
fies the study in one respect is likely to weaken it in an- 
bthef. 

62. A strictly representative sample may provide less informa- 
tion than a sample that bverrepreserits exceptional cases and 
deliberately varies realizations. 
X 63. The symmetric, nonsequential desjgns familiar from labor- 
atory research and survey research are rarely appropriate for 
ieyalUatidns. 

64. Multiple indicators of butcojjlcs fcinforce one another logic- 
ally as well as statistically. This is true tor measures of ade- 
quacy of program implemeritatibh as well as for measures of 
changes in client behavior. 

:t( * * 

65. In project-by-project evaluation, each study analyzes u spoon- 
ful dipped from a sea of uncertainties. 

X 66; in any pninnry statistical investigation, analyses by in- 
dependent teams should be made before the repbrt is 

distributed. 

. X 67. Evaluations of a program conducted in parallel by different 
teams can capitalize on disparate perspectives and teclmical 
skills. 

X 68; The evaluator should allocate investigative resburces by cbri- 
sidering four criteria simultaneously: prior uncertainty about 
a question, costs of information, anticipated information 
yield, and leverage of the information on subsequent tiiink- 
irig arid actibri. ' 

69. A particular control is warranted if it cari be iristallcd at 
reasonable costs and If, in the absence of that cbhtrbl, 
a positive effect could be persuasively explained away. 

70. The importance of comparative data depends on the nature 
bf the cdmparisdn proposed and on the stage of program 
riiatUrity. 
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X 71. When programs have multiple and perhaps dissimilar out- 
comes, cdinp;irisbh is invariably judgmciital. No technol- 
ogy for comparing benefits will silence par;isan discord. 

* * * 

72. PresenMnstitutional arrangements for evaluatic?n make it 
difricult or impossible to carry on the most useful kinds of 
evaluatibn., 

73. In typical Tcderal contracting, many basic researcji decisions 
are made without consulting the evaluators who wili do the 
v/ork. 

X 74. The personal scientific fesponsibility fourid in ordinary re- 
search grants is lacking in cbhtract evaluation; the "principal 
Investigator" is a firm with interchangeable personnel. 

75. Though the information from an evaluation is typically not 
used at a foreseeabie moment to make a foreseen choice, in 
many evaluations a deadline set at the start of the study 
dominates the effort. 

76. Evaluation contracts are increasing in size, but tying many 
strands into a single knot is rarely the best way to get use- 
ful information. 

X 77. targe-scale evaluations are not necessarily better than smaller 
ones. 

X 78. Major evaluations should have multiple sponsorship by agen- 
cies with different perspectives. 

X 79. Decentralizing much evaluation to the stpte level would be 
a healthy development. 

nt m ^ 

80. Society will obtain the assistance that evaluations can give 
only when there is a strong evajuatiori profession, clear about 
its social role arid the riatiire of its work. 

81. There is a boom town excitement in tlie evaluation com- 
munity, but in constant dollars federal funding for evalu- 
ation research has regressed in the last few years. 

82. It is inconceivable that evaluators will win their battle for 
appfopriate respdrisibiiities if they remain unacquainted 

with one another, insensitive to their common interests, 
and fractionated intellectually. 

♦ ♦ ♦ 
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83; For any suitably broad social problem, a "soclai probleni 
study group" shduid be set up. It would be charged to inform 
itself by weighing, di^estiilg^ and iiiterpretirig what is kiiowii. 
It woiild foster iieeded investigations and inake the policy- 
shaping cbminuiiity aware of what is and is not known. 

♦ • * * 

84: Honesty and balance in program evaluation will be increased 
by critical review of the performance of cvaluators and 
spdhsbrs. 

85. Oversight by peers is the most promising means of uphold- 
ing professional standards and of precipitating debate about 
strategic and tactical issues. 

86. The best safeguard against prematurely frozen standards 
for evaluative practice is multiple^ independent sources 
of criticism. 

87. There is need for exchanges more energetic than the typical 
academic discussion and more responsible than debate 
arhbrig partisans. 

88. Reviews of evaluation shoutd be far more frequent than at 
present, and reviews from diverse perspectives should appear 
together. 

* * * 

89. For the prospective evaluator, basic training at the doctoral 
level in a specific social science is preferable to training re- 
stricted to evaluation methods. 

90. Training in evaluation is tod often the stejjchild of a depart- 
ment cliiefly engaged in training academicians or providers 
of service. 

9 1 . Case-study seminars scrutinizing diverse evaluative studies 
provide a needed interdisciplinary perspective. 



92. Internsliips with policy agencies that use evaluation sensi- 
tize future evaluators to the realities of evaluatidn use and 
nonuse. These realities are hard to convey in a classroom. 

* ♦ ♦ 

93. The evaluator is ah educator; his success is to be judged by 
what others learn. 

X 94: Those who shape policy should reach decisions with their 

eyes open: it is the evaluatdr's task td illuminate the sitUa- 

^^on> not td dictate the decisidil. 
X 93. Scientific quality is riot the principal staridard; an evaluation 

should aim to be comprehensible, correct and complete, and 

credible to partisans on all sides: 
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